A New Adaptive Fault-Tolerant Routing Methodology for Direct Networks

نویسندگان

  • María Engracia Gómez
  • José Duato
  • Jose Flich
  • Pedro López
  • Antonio Robles
  • Nils Agne Nordbotten
  • Tor Skeie
  • Olav Lysne
چکیده

Interconnection networks play a key role in the fault tolerance of massively parallel computers, since faults may isolate a large fraction of the machine containing many healthy nodes. In this paper, we present a methodology to design fully adaptive fault-tolerant routing algorithms for direct interconnection networks that can be applied to different regular topologies. The methodology is mainly based on the selection of an intermediate node (if needed) for each source-destination pair. Packets are adaptively routed to the intermediate node and, from this node, they are adaptively forwarded to their destination. This methodology requires only one additional virtual channel, even for tori. Evaluation results show that the methodology is 7-fault tolerant, and for up to 14 faults, more than 99% of the combinations are tolerated, also without significantly degrading performance in the presence of faults.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Adaptive Fault-Tolerant Protocol for Direct Multiprocessors Networks

This paper investigates the fault tolerance problem in direct networks. Conservative flow control mechanisms such as Pipelined Circuit Switching (PCS) ensure the existence of a path to the destination before transmission. This ensures achieving reliable fault-tolerant system on the expense of performance. Optimistic flow control mechanisms such as Wormhole Switching (WS) realize very good perfo...

متن کامل

An Efficient Fault-Tolerant Routing Methodology for Direct Interconnection Networks

Nowadays, massively parallel computing systems are being built with thousands of nodes. This huge number of nodes significantly affects the probability of failure. Thus, it is critical to keep these systems running even in the presence of failures. The interconnection network plays a key role in the performance achieved by these systems, since failures in the interconnection network may isolate...

متن کامل

CAFT: Cost-aware and Fault-tolerant routing algorithm in 2D mesh Network-on-Chip

By increasing, the complexity of chips and the need to integrating more components into a chip has made network –on- chip known as an important infrastructure for network communications on the system, and is a good alternative to traditional ways and using the bus. By increasing the density of chips, the possibility of failure in the chip network increases and providing correction and fault tol...

متن کامل

An Adaptive LEACH-based Clustering Algorithm for Wireless Sensor Networks

LEACH is the most popular clastering algorithm in Wireless Sensor Networks (WSNs). However, it has two main drawbacks, including random selection of cluster heads, and direct communication of cluster heads with the sink. This paper aims to introduce a new centralized cluster-based routing protocol named LEACH-AEC (LEACH with Adaptive Energy Consumption), which guarantees to generate balanced cl...

متن کامل

A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate Nodes

Massively parallel computing systems are being built with thousands of nodes. Because of the high number of components, it is critical to keep these systems running even in the presence of failures. Interconnection networks play a key-role in these systems, and this paper proposes a fault-tolerant routing methodology for use in such networks. The methodology supports any minimal routing functio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004